Back

npj Genomic Medicine

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match npj Genomic Medicine's content profile, based on 33 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Ancestry-stratified variant classification in monogenic diabetes genes: annotation coverage and differential curation burden

Dario, P.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.06.26350230 medRxiv
Top 0.1%
7.2%
Show abstract

Variant databases ClinVar and gnomAD are the backbone of clinical variant interpretation, but their population composition is skewed toward European ancestry. Whether this skew creates systematic classification disadvantages for non-European patients with monogenic diabetes has not been examined at the database level. ClinVar variant_summary (GRCh38, April 2026; 4,421,188 variants) was cross-referenced with gnomAD v4.0 genome data for 17 monogenic diabetes genes. Annotation coverage and variant classification rates were computed stratified by genetic ancestry group (AFR, AMR, EAS, SAS, MID, NFE, FIN, ASJ). Of 14,691 gnomAD variants across the 17 genes, only 29.7% had any ClinVar classification (range: 12.7%-61.3% by gene). Among classified variants, non-Finnish European (NFE) variants had the highest variant of uncertain significance (VUS) rate (32.1%) and the lowest benign/likely benign fraction (41.6%), consistent with a large submission volume without functional follow-up. African-ancestry (AFR) variants showed the second-highest VUS rate (29.2%), not statistically distinguishable from NFE after Bonferroni correction, while all other non-European groups had significantly lower rates (all p < 0.001). GCK showed a pattern inversion - non-European VUS rate (18.5%) exceeding European (15.0%) - consistent with progressive reclassification in European populations absent in non-European cohorts. Annotation coverage and VUS divergence were uncorrelated (r = -0.15, p = 0.57). The primary equity problem is a 70% annotation gap combined with a non-European curation deficit, not a simple VUS excess. Ancestry-stratified evaluation of ClinGen Variant Curation Expert Panel (VCEP) criteria performance is warranted across disease domains.

2
FRMPD4, a causal gene for intellectual disability and epilepsy, is associated with X-linked non-syndromic hearing loss

Liedtke, D.; Rak, K.; Schrode, K. M.; Hehlert, P.; Chamanrou, N.; Bengl, D.; Katana, R.; Heydaran, S.; Doll, J.; Han, M.; Nanda, I.; Senthilan, P. R.; Juergens, L.; Bieniussa, L.; Voelker, J.; Neuner, C.; Hofrichter, M. A.; Schroeder, J.; Schellens, R. T.; de Vrieze, E.; van Wijk, E.; Zechner, U.; Herms, S.; Hoffmann, P.; Mueller, T.; Dittrich, M.; Bartsch, O.; Krawitz, P. M.; Klopocki, E.; Shehata-Dieler, W.; Maroofian, R.; Wang, T.; Worley, P. F.; Goepfert, M. C.; Galehdari, H.; Lauer, A. M.; Haaf, T.; Vona, B.

2026-03-30 genetic and genomic medicine 10.64898/2026.03.27.26349271 medRxiv
Top 0.1%
5.0%
Show abstract

Abstract Background Understanding the phenotypic spectrum of disease-associated genes is essential for accurate diagnosis and targeted therapy. FRMPD4 (FERM and PDZ Domain Containing 4) has previously been associated with intellectual disability and epilepsy. However, its potential role in non-syndromic hearing loss has not been explored. Methods We performed genetic analysis in two unrelated families presenting with non-syndromic sensorineural hearing loss, identifying maternally inherited missense variants in FRMPD4. Clinical phenotyping included audiological assessment and evaluation for neurodevelopmental involvement. Cross-species expression analyses were conducted in Drosophila, zebrafish, and mouse. Functional characterization included quantitative evaluation of sound-evoked responses in Drosophila nicht gut hoerend (ngh) mutants, assessment of neuronal development and acoustic startle responses in zebrafish loss of function models, and morphological cochlear analyses with auditory brainstem response measurements in knockout mice. Results Three affected males from two unrelated families presented with prelingual, bilaterally symmetrical sensorineural hearing loss, with confirmed congenital onset in one individual and no evidence of neurodevelopmental abnormalities. Cross-species analyses demonstrated evolutionarily conserved expression of FRMPD4 in auditory structures. In Drosophila, quantitative analysis of sound-evoked responses in ngh mutants revealed impaired auditory function. Zebrafish loss of function models exhibited reduced neuronal populations in the otic vesicle and posterior lateral line, abnormal neuromast development, and diminished acoustic startle responses. In mice, Frmpd4 knockout resulted in high-frequency hearing loss and cochlear abnormalities consistent with the human phenotype. Conclusions Our findings expand the phenotypic spectrum of FRMPD4 to include non-syndromic sensorineural hearing loss and establish its evolutionarily conserved role in auditory function. These results have direct implications for genetic diagnosis and variant interpretation in patients with hearing loss.

3
PAVS: A Standardized Database of Phenotype-Associated Variants from Saudi Arabian Rare Disease Patients

Abdelhakim, M.; Althagafi, A.; SCHOFIELD, P.; Hoehndorf, R.

2026-04-06 genetic and genomic medicine 10.64898/2026.04.05.26350189 medRxiv
Top 0.1%
3.7%
Show abstract

Genotype-phenotype databases are essential for variant interpretation and disease gene discovery. Genetic variation differs among human populations, mainly in allele frequencies and haplotype patterns shaped by ancestry and demographic history. Population-specific genotypes can influence traits and disease risk; this makes population specific characterization important. Most existing resources focus on the characterization of a population's genetic background, but do not represent the resulting phenotypes. We have developed PAVS (Phenotype-Associated Variants in Saudi Arabia), a curated, publicly accessible database that integrates 5,132 Saudi clinical cases from four Saudi cohorts and 522 cases from analysis of a mixed-population cohort, together with 1,856 cases from the Deciphering Developmental Disorders study (DDD) and 9,588 literature phenopackets. Each case record describes patient-level phenotypes, encoded with the Human Phenotype Ontology (HPO), and links them to genomic variants, gene identifiers, zygosity, pathogenicity classifications, and disease diagnoses mapped to standardized disease terminologies. The data is represented in Phenopackets format and as a knowledge graph in RDF. Additionally, a web interface provides phenotype-based similarity search, gene and variant browsers, and an HPO hierarchy explorer. We evaluate the utility of the phenotype annotations for gene prioritization using semantic similarity. While there are clear differences to global literature-curated databases, phenotypes in PAVS can successfully rank the correct gene at high rank (ROCAUC: 0.89). PAVS addresses a gap in population-specific genotype-phenotype resources and provides a benchmark for phenotype-driven variant prioritization in under-represented populations.

4
Health Impact Assessment of BRCA1/2 Cascade Screening for the Personalized Prevention of Hereditary Breast and Ovarian Cancers in Italy

Valz Gris, A.; Giacobini, E.; Tricomi, V.; Rumi, F.; Valentini, I.; Cristiano, A.; Testa, S.; Rosano, A.; Pezzullo, A. M.; Boccia, S.

2026-04-15 public and global health 10.64898/2026.04.13.26350758 medRxiv
Top 0.1%
3.6%
Show abstract

Introduction Pathogenic germline variants in the BRCA1 and BRCA2 genes confer a markedly increased risk of breast and ovarian cancer, for which effective preventive strategies are available. Although national and international guidelines recommend BRCA testing and cascade screening of relatives, implementation in Italy remains highly heterogeneous across regions. This study estimates the potential population health and cost impact of achieving full nationwide implementation of BRCA1/2 cascade screening in Italy and identifies key organisational barriers and priority actions for implementation. Methods We conducted a Health Impact Assessment integrating literature review, simulation modelling, and stakeholder consultation. A decision tree and Markov model compared the current heterogeneous implementation of BRCA screening in Italy with an ideal scenario reflecting full adherence to national guidelines, optimal cascade screening, and uptake of preventive strategies. Outcomes included breast and ovarian cancer incidence and mortality, healthcare costs over a lifetime horizon (80 years). Key barriers affecting organisational feasibility, acceptability, and patient well-being were assessed, and a set of priority action recommendations was developed. Results In the ideal scenario, 25,626 eligible cancer patients would undergo BRCA testing annually, identifying 4,254 mutation carriers and enabling cascade testing of 27,650 relatives, of whom 8,682 would be BRCA-positive. Under the current implementation, only 8,807 patients and 2,168 relatives are tested, identifying 948 carriers. Over 30 years, full implementation would prevent 821 cancer cases (- 27.9%) and 1,282 deaths (- 49.7%) compared with the current scenario. While initial expenditures increase due to expanded testing and preventive interventions, cumulative costs decrease over time, resulting in net savings of 5.8 million euros at 30 years and a saving per event avoided (- 2,779 euros). Major implementation barriers include fragmented governance, limited access to genetic counselling, heterogeneous laboratory practices, insufficient professional training, and weak referral pathways. Conclusion Full implementation of BRCA1/2 cascade screening in Italy would yield substantial population health benefits and long-term cost savings. Coordinated national governance, standardised pathways, investment in counselling and workforce capacity, and robust monitoring systems are essential to ensure equitable access and sustainable delivery of personalised cancer prevention. This study demonstrates the value of the HIA methodology for evaluating and guiding genomic prevention policies.

5
Bulk RNA sequencing deconvolution of pancreatic ductal adenocarcinoma identifies cancer-associated fibroblast subsets associated with survival and tumor microenvironment composition

Dam, N.; Steketee, M. F. B.; Strijk, G.; Koning, W. d.; Hawinkels, L. J. A. C.; Kemp, V.; Eijck, C. H. J. v.; Kim, Y.; Eijck, C. W. F. v.; Os, B. W. v.

2026-04-06 cancer biology 10.64898/2026.04.03.716260 medRxiv
Top 0.1%
3.6%
Show abstract

Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer characterized by a high abundance of cancer-associated fibroblasts (CAFs), which influence therapy response, tumor biology and tumor aggressiveness. CAFs are a heterogeneous cell type and previous single-cell RNA sequencing (scRNAseq) of PDAC tumors identified three main CAF subtypes: myofibroblastic, inflammatory and antigen-presenting CAFs (myCAF, iCAF, apCAF, respectively). However, scRNAseq on large patient cohorts is often not feasible due to costs and technical constraints. Therefore, bulk RNAseq deconvolution can be used to identify cell types within the heterogeneous tumor microenvironment. Here, Statescope deconvolution was used to identify different cell types of the tumor microenvironment within an early onset PDAC cohort, comprising 74 patients aged under 60. Three CAF populations were identified (iCAFs, myCAFs and desmoplastic CAFs), and their correlations with tumor microenvironment components, mutational signatures and survival were examined. iCAFs were associated with classical-like tumor cells, whereas myCAFs and desmoplastic CAFs correlated with basal-like tumor cells. Desmoplastic CAFs are associated with inflammatory granulocytes/neutrophils, while negatively associating with monocyte-derived macrophages and immature/transitional B cells. No associations were observed between mutational signatures and the abundance of CAF and epithelial tumor subtypes. Interestingly, a high abundance of CAFs, and specifically increased iCAF abundance, was associated with improved survival. This iCAF-mediated survival effect was predominantly apparent in female patients. All in all, deconvolution of bulk RNA sequencing data, followed by its integration with clinical and biological parameters, reveals the heterogeneity and prognostic implications of CAF subpopulations in the tumor microenvironment of early onset PDAC patients.

6
Evaluating splicing factor and kinase network crosstalk through global phosphoproteomics

Crowl, S.; Singh, S.; Zhang, T.; Naegle, K. M.

2026-04-21 systems biology 10.64898/2026.04.16.718710 medRxiv
Top 0.1%
3.6%
Show abstract

Both splicing and kinase signaling are biochemical processes that fundamentally determine and shape cell physiology. Although there has been some indication that there is an interaction between the two - splicing can alter the availability of exons encoding kinase targets and kinases can phosphorylate splicing factors - it has yet to be established the extent to which altering splicing factor expression impacts kinase signaling networks. In this work, we implemented a data-driven analysis using ENCODE RNA-sequencing data and prior work mapping post-translational modifications onto splice events to identify candidate splice factor perturbations that show extensive alterations to phosphorylation-encoding protein products. We then replicated the ENCODE knockdown experiments and performed global phosphoproteomics for two candidates, U2AF1 and SRSF3, complementing the transcription-level data. Both knockdowns showed extensive changes in phosphorylation and kinase activities, both basally and upon receptor tyrosine kinase stimulation. U2AF1 knockdown drove decreased JNK-associated cell death signaling but elevated chromosome regulation through CSNK2A1, PLK, and EIF2AK4 activity. SRSF3 knockdown, on the other hand, led to decreased cell cycle signaling through CDK and HIPK2 but increased cytoskeletal signaling through various PAKs. In addition, we found a striking enrichment of phosphorylated splicing regulators in both knockdowns that were linked to their splicing activity, such as HNRNPC, suggesting potential feedback and crosstalk between splice factors through signaling pathway activation. Importantly, comparison of differential phosphorylation measurements from this study to mRNA expression and splicing measurements from ENCODE revealed significant knockdown-dependent protein regulation, not captured by transcriptomic measurements alone, underscoring the value of phosphoproteomic profiling after splice factor perturbations. Combined, the transcriptomics and phosphoproteomics reveal deep interconnection between the two processes that are relevant to understanding cell signaling in health and disease.

7
Comparative fine-mapping of breast cancer susceptibility loci using summary statistics methods and multinomial regression

O'Mahony, D. G.; Beasley, J.; Zanti, M.; Dennis, J.; Dutta, D.; Kraft, P.; Kristensen, V.; Chenevix-Trench, G.; Easton, D. F.; Michailidou, K.

2026-04-22 epidemiology 10.64898/2026.04.21.26351364 medRxiv
Top 0.1%
3.6%
Show abstract

Summary statistics fine-mapping methods offer advantages over classical methods, including avoiding data-sharing constraints and improved modelling of correlated variables and sparse effects. However, its performance has not been comprehensively evaluated in breast cancer using real-world data. Previous multinomial stepwise regression (MNR) fine-mapping analyses for breast cancer identified 196 credible sets. Here, we apply summary statistics fine-mapping, compare methods, and assess parameters influencing performance. Using summary statistics from the Breast Cancer Association Consortium, we compared finiMOM, SuSiE, and FINEMAP to published MNR results across 129 regions. Performance was assessed by recall using in-sample and out-of-sample LD. Discordant credible sets were examined for technical factors, and target genes were defined using the INQUISIT pipeline. SuSiE showed the closest agreement with MNR. Results varied across regions depending on the assumed number of causal variants (L), with higher values reducing recall and no single L maximising performance. At optimal L per region, SuSiE identified 8,192 CCVs in 244 credible sets, with recall of 88%, 86%, and 72% for overall, ER-positive, and ER-negative breast cancer. Thirty MNR sets were missed. Discordance was partially explained by allele flips, imputation quality, and array heterogeneity. Fifty-two MNR-identified genes, including BRCA2, WNT7B and CREBBP were not recovered, while additional candidate genes were identified. Using out-of-sample LD reduced recall by 3% but identified novel variants. Fine-mapping results vary across methods, and no single approach is sufficient. The choice of L strongly influences results, and combining analytical approaches with functional validation can improve causal variant identification.

8
Detection of Candidate Circular RNAs to Monitor Anti-Hormonal Response in the Mammary Gland

Trummer, N.; Weyrich, M.; Ryan, P.; Furth, P. A.; Hoffmann, M.; List, M.

2026-03-30 cancer biology 10.64898/2026.03.26.714379 medRxiv
Top 0.2%
3.5%
Show abstract

Anti-hormonal therapies such as selective estrogen receptor modulators like tamoxifen or aromatase inhibitors like letrozole represent a cornerstone for breast cancer prevention and therapy of estrogen receptor-positive breast cancer. Therapeutic monitoring can include blood tests and imaging; however, genetically-based approaches are not yet in practice. Ideally, a test would be able to detect a positive molecular response across different estrogen pathway-suppressive approaches. Circular RNAs are a species of non-coding RNAs detectable in plasma that have been proposed as non-invasive therapeutic biomarkers. To determine whether a set of specific circular RNAs is altered across estrogen-suppressive pathway approaches, we analyzed mammary gland-specific total RNA sequencing data from two individual genetically engineered mouse models (GEMMs) of estrogen pathway-induced breast cancer, with or without exposure to tamoxifen or letrozole. The nf-core/circrna pipeline was used to identify circRNAs that were differentially expressed in response to either tamoxifen or letrozole. We then screened for circRNAs that were differentially regulated by both anti-hormonals. Four up-regulated and 31 down-regulated circRNAs with host genes known to be expressed in human breast epithelial cells were identified as showing reproducible differential regulation in response to anti-hormonal treatment.

9
Assessing Swedish Genetic Counselling Outcome Measures for Autism and General Use: Rasch Findings Highlight the Need for Improved Measures

Nordstrand, M.; Fajutrao Falk, S.; Johansson, M.; Pestoff, R.; Tammimies, K.

2026-04-15 genetic and genomic medicine 10.64898/2026.04.13.26350766 medRxiv
Top 0.2%
3.1%
Show abstract

Genetic counselling outcome measures are increasingly adapted for diverse clinical contexts. While the Genetic Counselling Outcome Scale (GCOS-24) is available in Swedish, no autism-specific version has been developed. Therefore, we adapted the Swedish GCOS-24 using the English version of the modified GCOS-24 (mGCSOS-24) to create a Swedish autism-specific mGCOS-24. Thereafter, we evaluated both the Swedish autism mGCOS-24 and the Swedish general GCOS-24 using Rasch analysis to assess their psychometric properties. Both instruments exhibited structural challenges, including multidimensionality, disordered thresholds, local item dependence, and invariance issues. For the Swedish autism mGCOS-24, we were able to identify subscales with acceptable measurement properties. However, applying the same structure to the Swedish general GCOS-24 did not resolve its broader limitations. This study introduces the first Swedish autism-specific mGCOS-24 and represents the first Rasch-based evaluation of any GCOS-24 or mGCOS-24 in Swedish. Our findings highlight important opportunities for measure refinement but also indicate that new or more substantially adapted tools may be needed to capture outcomes of genetic counselling in autistic populations.

10
Duplication within 14q32.13 implicates a chimeric CLMN::SYNE3 RNA transcript in cerebellar ataxia

Litster, T. M.; Wilcox, R. A.; Carroll, R.; Gardner, A. E.; Nazri, N. M.; Shoubridge, C. A.; Delatycki, M. B.; Lohmann, K.; Agzarian, M.; Turella Divani, R.; Rafehi, H.; Scott, L.; Monahan, G.; Lamont, P. J.; Ashton, C.; Laing, N. G.; Ravenscroft, G.; Bahlo, M.; Haan, E.; Lockhart, P. J.; Friend, K. L.; Corbett, M. A.; Gecz, J.

2026-04-24 genetic and genomic medicine 10.64898/2026.04.23.26350376 medRxiv
Top 0.2%
3.0%
Show abstract

The spinocerebellar ataxias (SCAs) are a clinically heterogenous group of neurodegenerative disorders that affect movement, vision, speech and balance. Here, we reassign the linkage of SCA30 to 14q32.13 based on a cumulative LOD score >12. Within this interval we identified a 331 kb duplication, absent in population controls and not observed in >800 unrelated individuals with genetically unresolved cerebellar ataxia. RNASeq analysis of patient-derived lymphoblastoid cell lines revealed a splice-mediated chimeric transcript resulting from the duplication event. This transcript joined exon 1 of CLMN to exon 2 of SYNE3. In silico translation predicted that this chimeric transcript would produce a short N-terminal peptide corresponding to exon 1 of CLMN and the usually untranslated region of exon 2 of SYNE3 fused to the complete and in-frame SYNE3 protein. Transient overexpression of SYNE3 or the CLMN::SYNE3 fusion protein, in both HeLa cells and mouse primary cortical neurons, resulted in equivalent cellular outcomes including altered nuclear morphology and chromosomal DNA fragmentation. SYNE3 forms part of the linker of nucleoskeleton and cytoskeleton complex and is not usually expressed in cerebellar Purkyn[e] neurons while, CLMN has a Purkyn[e] specific expression pattern within the brain. Our data suggests that ectopic expression of SYNE3 in cerebellar Purkyn[e] neurons, mediated by the CLMN promoter, leads to cerebellar atrophy and causes spinocerebellar ataxia in the SCA30 family. This is an example of Mendelian disease arising from a novel, chimeric transcript with a likely dominant negative effect. Chimeric transcripts are commonly associated with cancers, but they are not often associated with monogenic disorders. Detection of chimeric transcripts as part of structural variant analysis could increase the genetic diagnostic yield of Mendelian disorders.

11
Polygenic risk scores enhance the identification of carriers of monogenic forms of idiopathic pulmonary fibrosis

Alonso-Gonzalez, A.; Jaspez, D.; Lorenzo-Salazar, J. M.; Delgado, A.; Quintero-Bacallado, A.; Ma, S.-F.; Strickland, E.; Mychaleckyj, J.; Kim, J. S.; Huang, Y.; Adegunsoye, A.; Oldham, J. M.; Maher, T. M.; Guillen-Guio, B.; Wain, L. V.; Allen, R. J.; Saini, G.; Jenkins, R. G.; Molina-Molina, M.; Zhang, D.; Kim Garcia, C.; Martinez, F. J.; Noth, I.; Flores, C.

2026-04-18 genetic and genomic medicine 10.64898/2026.04.16.26350967 medRxiv
Top 0.2%
2.8%
Show abstract

Background: Idiopathic pulmonary fibrosis (IPF) is a rare disease with a poor prognosis. Disease risk involves rare and common genetic variants. However, an inverse association have been described between them. Accordingly, IPF patients with a higher polygenic risk score (PRS) for IPF are less likely to carry rare deleterious variants and vice versa. Here, we evaluate weather PRS of IPF could serve as an additional criterion to patient prioritisation for rare variant discovery. Methods: We identified carriers based on the presence of rare qualifying variants (QVs) in genes linked to monogenic forms of pulmonary fibrosis in 888 IPF patients from the Pulmonary Fibrosis Foundation Patient Registry (PFF-PR). Genome-wide association study (GWAS) summary statistics from independent cohorts were used to construct a whole-genome PRS (WG-PRS) using a clumping and thresholding method (C+T) and a Bayesian method (SBayesRC). PRS were also derived from 19 known common sentinel IPF variants (Sentinel-PRS). Logistic regression models were used to evaluate associations between PRS and carrier status. Discriminatory performance was evaluated using area under the curve (AUC) analysis, and comparisons were made with DeLong test. Validation was performed in 472 IPF individuals from the UK PROFILE cohort. Results: IPF-PRS were strongly associated with the QVs carrier status: Odds Ratio [OR] 0.65 (95% Confidence Interval [CI] 0.53-0.79) for WG-PRSC+T, OR 0.71 (95% CI 0.59-0.86) for WG-PRSSBayesRC, and OR 0.77 (95% CI 0.63-0.94) for Sentinel-PRS. Adding WG-PRS to the patient personal clinical history improved the prediction of QVs carriers: AUC=0.62 for the clinical model, AUC=0.68 for WG-PRSC+T (DeLong test, p=9.54x10-4) and AUC=0.66 for WG-PRSSBayesRC (DeLong test, p=0.02). Adding of IPF-PRS to clinical variables correctly reclassified 22.8% of carriers when using WG-PRSC+T, 20.8% when using Sentinel-PRS, and 16.7% for WG-PRSSBayesRC. WG-PRSSBayesRC and the Sentinel-PRS also demonstrated improved prediction of QVs carriers in telomere-related genes in PROFILE. Conclusions: Incorporating IPF-PRS into a model based on the patient clinical history improves the identification of QVs carriers. Although the overall discriminatory power was moderate, these findings raise de the possibility of using WG-PRS as useful criterion for rare variant discovery in patients with IPF and enhance decision-making.

12
Epigenetic Signatures in Monozygotic and Dizygotic Twins Discordant for Orofacial Clefts

Petrin, A. L.; Keen, H. L.; Dunlay, L.; Xie, X. J.; Zeng, E.; Butali, A.; Wilcox, A.; Marazita, M. L.; Murray, J. C.; Moreno-Uribe, L.

2026-04-08 genetic and genomic medicine 10.64898/2026.04.07.26350251 medRxiv
Top 0.2%
2.6%
Show abstract

Introduction: Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is a common congenital malformation with complex etiology involving both genetic and environmental factors. Epigenetic mechanisms may mediate environmental contributions, but separating genetic from environmental effects remains challenging. Methods: We present an epigenome-wide association study with 32 monozygotic and 22 dizygotic twin pairs discordant for NSCL/P on blood and saliva samples. Differential methylation analysis was conducted using linear models to identify CpG sites showing significant methylation differences between affected and unaffected twins followed by functional annotation and pathway enrichment analysis. Results: The top-ranked finding is a differentially methylated region comprising two CpG sites at the CYP26A1 locus, cg12110262 (P = 3.21x10-7) and cg15055355 (P = 1.39x10-3). CYP26A1 is essential for retinoic acid catabolism and craniofacial patterning. The chromatin regulator ANKRD11, which causes KBG syndrome featuring cleft palate was the second best hit. Differentially methylated CpG sites showed significant enrichment in craniofacial enhancers and overlap with multiple GWAS-validated cleft genes including VAX1, PVRL1, SMAD3, and PRDM16. Conclusions: Our findings implicate retinoic acid signaling and chromatin regulation in NSCL/P etiology and demonstrate the value of discordant twin designs for distinguishing environmental from genetic epigenetic contributions to complex malformations.

13
Reusing Blood Samples from a Hospital-based Cohort to Apixaban Plasma Concentrations

Murray, K. T.; Fabbri, D. V.; Annis, J. S.; Clark, C. R.; Pulley, J. M.; Brittain, E.; Gailani, D.

2026-04-08 pharmacology and therapeutics 10.64898/2026.04.07.26350322 medRxiv
Top 0.2%
2.6%
Show abstract

In the management of atrial fibrillation, the most frequently prescribed oral anticoagulant is apixaban, given at a fixed dose of 5mg BID. Apixaban is predominantly metabolized by cytochrome P4503A4 (CYP3A4) and is also a substrate for the drug efflux transporter P-glycoprotein (P-gp). In nearly 300,000 Medicare patients with AF receiving apixaban, we previously showed that concomitant therapy with drugs that inhibit both CYP3A4 and P-gp, specifically amiodarone or diltiazem, significantly increased serious bleeding that caused hospitalization and/or death. We hypothesized that this adverse effect was mediated by an increase in apixaban plasma concentrations caused by concomitant therapy that reduced drug elimination. Utilizing left-over samples obtained from clinically indicated blood draws that would typically be discarded, the Vanderbilt University Medical Center biobank BioVU contains >353,000 samples linked to de-identified electronic medical records (EMRs), with both DNA and plasma harvested. Of 35 samples drawn from patients taking apixaban 5mg BID, 5 were identified to be drawn from patients concomitantly taking drugs inhibiting both CYP3A4 and P-gp. Using a chromogenic anti-Xa assay, we found that plasma concentrations of apixaban were significantly higher (347{+/-}64 ng/mL; mean{+/-}SEM) for patients receiving concomitant CYP3A4/P-gp-inhibiting drugs compared to those not treated with these drugs (166{+/-}67 ng/mL; P=0.025, Mann Whitney). There were no differences between the 2 patient groups with respect to age, weight, or serum creatinine. The results of this pilot study provide preliminary data to support our hypothesis, and they demonstrate the practicality of obtaining pharmacokinetic data from a large cohort of plasma samples linked to deidentified EMRs. This approach could be used to define the role of apixaban levels in high-risk clinical scenarios and to better understand the relationship between drug levels and bleeding risk.

14
Invasive cervical cancers after an HPV-negative test: insights from screening histories

Hassan, S. S.; Nordqvist-Kleppe, S.; Asinger, N.; Wang, J.; Dillner, J.; Arroyo Muhr, L. S.

2026-04-13 public and global health 10.64898/2026.04.11.26350679 medRxiv
Top 0.2%
2.6%
Show abstract

Human papillomavirus (HPV) testing is the primary method for cervical cancer screening, and a negative HPV test is associated with a very low subsequent risk of invasive cancer. Nevertheless, a small number of cervical cancers are diagnosed following an HPV-negative testing result, posing challenges within HPV-based screening pathways. Using nationwide Swedish registry data of HPV testing, we identified women diagnosed with invasive cervical cancer between 2019 and 2024 and reconstructed HPV testing histories from the National Cervical Screening Registry (NKCx). The most recent HPV test prior to diagnosis was defined as the index test, and longitudinal HPV testing trajectories were classified among women with an HPV-negative index test. Of 3,000 women diagnosed with invasive cancer, 243 (8.1%) had an HPV-negative index test. These women were older at diagnosis and more frequently diagnosed at advanced stages compared with women with an HPV-positive index test. Most HPV-negative index tests (66.3%) were performed in the peri-diagnostic period (+/- 30 days). Among women with an HPV-negative index test, 52.7% (128/243) had no prior HPV testing recorded, while the remainder had consistently HPV-negative histories (33.3%, 83/243) or evidence of prior HPV positivity before the index negative test (14%, 32/243). Possible recurrent HPV positivity following an intervening negative test was rare (0.4%, 1/243). HPV-negative screening results preceding invasive cancer reflect heterogeneous screening histories and cannot be explained solely by test failure. Findings highlighting the importance of reaching women earlier in screening programs and show that fluctuating HPV detectability is rare.

15
Genetic confounding in the associations between maternal health and autism

Arildskov, E. S.; Ahlqvist, V. H.; Khachadourian, V.; Asgel, Z.; Schendel, D.; Hansen, S. N.; Grove, J.; Janecka, M.

2026-04-17 epidemiology 10.64898/2026.04.16.26351033 medRxiv
Top 0.2%
2.5%
Show abstract

The etiology of autism is influenced by genetic and non-genetic factors, with observational studies suggesting associations between early maternal health diagnoses and offspring autism. However, these associations may partly reflect shared familial genetic liability rather than direct causal effects. Using comprehensive national health registers and individual-level genetic data from the iPSYCH cohort (N=117,542), we examined whether maternal health diagnoses are associated with offspring polygenic scores (PGS) for autism. Such associations between maternal health and offspring autism would indicate shared genetic factors and the possibility of genetic confounding in the observational associations. We also tested such associations with PGSs for other neuropsychiatric and neurodevelopmental conditions that are genetically correlated with autism, but with better-powered PGS (due to larger GWAS sample sizes and likely more polygenic genetic architecture), as well as height, a negative control. Several maternal diagnoses were nominally associated with autism PGS in the child, including, e.g., certain obstetric complications, asthma, and obesity. After adjustment for multiple testing, the only statistically significant results included those between maternal diagnoses, predominantly psychiatric, and other neuropsychiatric and neurodevelopmental PGSs in the child. Sensitivity analyses confirmed the robustness of our results across exposure windows, diagnostic settings, and socioeconomic adjustments. These findings indicate that maternal diagnoses associated with autism partially reflect shared genetic liabilities between mothers and their children. However, such genetic effects, as captured by child PGS do not fully explain the observed associations, suggesting additional factors, including e.g., non-genetic familial factors, rare variants, and indirect effects.

16
lncRNA MANCR isoforms selectively mediate multiple levels of epigenomic and P53-responsive transcriptional control in triple negative breast cancer

Pacht, E.; Warren, J.; Toor, R.; Glass, K. C.; Greenyer, H.; Fritz, A.; Banerjee, B.; Frietze, S. C.; Lian, J.; Gordon, J.; Stein, G.; Stein, J.

2026-04-08 cancer biology 10.64898/2026.04.06.716674 medRxiv
Top 0.3%
1.8%
Show abstract

Long noncoding RNAs (lncRNAs) are important regulators of gene expression and are frequently dysregulated in cancer. The mitotically associated lncRNA MANCR is highly expressed in aggressive cancers and contributes to genomic instability in triple-negative breast cancer (TNBC), but the molecular mechanisms underlying its activity remain poorly defined. Here we integrate computational and experimental approaches to examine the structure and regulatory interactions of MANCR isoforms. Analysis of transcriptomic datasets revealed tumor-type-specific expression patterns for seven MANCR isoforms in breast cancer cell lines. Computational prediction of RNA secondary structures identified conserved structural features across isoforms, suggesting potential functional specialization. We identify p53 as a MANCR-interacting protein through computational docking and RNA immunoprecipitation sequencing (RIP-seq) and demonstrate that MANCR depletion reduces p53-dependent transcriptional activity. Chromatin isolation by RNA purification sequencing (ChIRP-seq) revealed 1, 250 genomic regions associated with MANCR, including enrichment of p53 consensus motifs and GC-rich sequence elements. Motif analysis further identified candidate sequence features associated with MANCR-occupied chromatin regions. Computational prediction of RNA-miRNA interactions identified multiple potential miRNA binding sites across MANCR isoforms, including miR-6756-5p, which targets the androgen receptor (AR). Consistent with this prediction, AR expression decreased following MANCR knockdown in TNBC cells. Together, these results suggest that MANCR isoforms may contribute to transcriptional regulation in TNBC through interactions with chromatin, p53 signaling pathways, and potential miRNA regulatory networks. One Sentence SummaryMitotically-associated lncRNA (MANCR) is prevalent in aggressive cancers interacting with DNA, P53, and miRNAs, to mediate multiple levels of epigenetic transcriptional control in triple negative breast cancer.

17
The population frequency of predicted pathogenic variants in the genes associated with Autosomal Dominant Polycystic Liver Disease (ADPLD) and kidney cysts

Varughese, S.; Huang, M.; Savige, J.

2026-04-16 nephrology 10.64898/2026.04.13.26350832 medRxiv
Top 0.3%
1.8%
Show abstract

Autosomal dominant polycystic liver disease (ADPLD) commonly results from a pathogenic variant in one of 6 genes (GANAB, ALG8, LRP5, PRKCSH, SEC61B, SEC63). Pathogenic variants in these genes are also associated with kidney cysts, which rarely cause kidney failure, but the genes are included in cystic kidney panels. This study determined the population frequency of predicted pathogenic variants in the ADPLD genes in the general population. Variants for each gene were downloaded from gnomAD and annotated with ANNOVAR. The population frequencies were calculated from the number of people with "predicted pathogenic" variants in gnomAD v.2.1.1:loss-of-function structural and copy number; null; and rare, computationally-damaging missense changes that affected a conserved residue. Frequencies were also estimated from the number of gnomADv.4.1 variants assessed as Pathogenic or Likely pathogenic in ClinVar. Predicted pathogenic variants affected one in 95 people using our strategy and gnomAD v.2.1.1, and one in 151 with ClinVar assessments of gnomAD v.4.1 variants. LRP5 and ALG8 which are associated with a milder clinical phenotype, were the commonest affected genes with both strategies. Predicted pathogenic variants in ADPLD appear more frequent in admixed American (one in 100), Finnish (one in 107) and African/African American (one in 130) people (p all <0.0001 compared with Europeans (one in 197).Predicted pathogenic variants for ADPLD may be even more common because of additional unidentified causative genes. However not all ADPLD variants result in liver cysts, nor indeed cystic kidneys, because of incomplete penetrance and variable expressivity.

18
Benchmarking scRNA-seq Copy Number Inference: A Comprehensive Evaluation and Practitioner Guide

Chang, H.-C.; Shi, Y.; Cheng, H.; Zou, J.; Chang, A. C.-C.; Schlegel, B. T.; Wang, W.; Brown, D. D.; Chen, F.; Wang, S.; Li, D.; Sai, R.; Michel, N.; Oesterreich, S.; Lee, A. V.; Tseng, G. C.

2026-04-15 cancer biology 10.64898/2026.04.12.718050 medRxiv
Top 0.3%
1.8%
Show abstract

Accurately inferring copy number variation (CNV) from scRNA-seq data is critical for identifying malignant cells, reconstructing tumor subclonal architecture, and uncovering the genomic drivers that dictate cancer cell biology. However, the performance of existing tools varies significantly, and current benchmarks lack the breadth of datasets and methods necessary to provide definitive guidance. We present a comprehensive benchmark of 12 CNV inference methods across 28 real datasets (>100,000 cells) and diverse synthetic datasets. By evaluating methods based on malignant cell classification accuracy, CNV inference accuracy, scalability, and robustness, we establish a definitive practitioners guideline: allele-aware methods like Numbat excel when high-quality allelic inference can be achieved, whereas expression-centric tools such as Clonalscope, CopyKAT, inferCNV, and SCEVAN remain reliable when raw sequencing data are unavailable. Our study provides both a practical decision-making framework for researchers and a public repository of standardized CNV profiles to catalyze further methodological innovation.

19
The Power of Partnership: Democratizing Genetic Prevalence to Empower Patient Advocacy

Baxter, S. M.; Singer-Berk, M.; Glaze, C.; Russell, K.; Grant, R. H.; Groopman, E.; Lee, J.; Watts, N.; Wood, J. C.; Wilson, M.; Rare As One Network, ; Rehm, H. L.; O'Donnell-Luria, A.

2026-03-31 genetic and genomic medicine 10.64898/2026.03.30.26349539 medRxiv
Top 0.3%
1.8%
Show abstract

Introduction: Accurate estimation of disease prevalence is crucial for public health and therapeutic development, but traditional methods are often inaccurate. Genetic prevalence, which estimates the proportion of a population with a causal genotype, using allele frequencies from population data, offers an important alternative. Methods: We partnered with 18 Rare As One patient organizations to estimate genetic prevalence for 22 autosomal recessive conditions using population data from two releases of the Genome Aggregation Database (gnomAD). To standardize and democratize these analyses, we developed the Genetic Prevalence Estimator (GeniE), a publicly available tool, for accessible calculations. Results: Conservative carrier frequencies in gnomAD v4.1 ranged from 1/164 to 1/11,888. The median change in genetic prevalence frequency between v2.1 to v4.1 was 0.806. Partnership with patient advocacy groups provided critical real-world context that refined the interpretation of these estimates. Discussion: These findings highlight that genetic prevalence is not a static figure but a dynamic, evolving measure with important caveats that need to be considered. Our study underscores the necessity of re-evaluations as databases expand. By integrating patient-partnered insights with the GeniE platform, we empower the genomics community to maintain transparent, up-to-date, and actionable data for rare disease advocacy and drug development.

20
A long-read RNA sequencing and polysome profiling framework reveals transposable element-driven transcript diversity and translational rewiring in glioblastoma

Pizzagalli, M.; Sasipalli, S.; Leary, O.; Tran, L.; Haas, B.; Tapinos, N.

2026-04-21 cancer biology 10.64898/2026.04.18.719388 medRxiv
Top 0.3%
1.7%
Show abstract

BackgroundTransposable elements (TEs) account for over half of the human genome and are often derepressed in cancer. TEs can add cryptic splice sites, undergo exonization, and generate gene-TE fusion transcripts, but the combined effects of TEs on RNA processing and translation in glioblastoma stem cells (GSCs) remains incompletely elucidated. ResultsWe combined long-read RNA sequencing with polysome profiling in four patient-derived GSCs and two neural stem cell (NSC) controls to resolve TE-associated transcript diversity and its relationship to ribosomal engagement. Across GSCs, we identified 13,421 alternative splicing (AS) events, 3,077 of which contained TEs within 150 bp of splice junctions. AS sites proximal to TEs were associated with increased isoform switching compared to non-TE-associated AS sites (odds ratio 2.9 - 4.3). Moreover, AS isoforms generated from TE-proximal sites were more likely to exhibit altered ribosomal association (odds ratio 2.54). Directional shifts were observed, with shorter isoforms associating with monosome fractions and longer isoforms with polysome fractions. To enable systematic detection of gene - TE chimeric transcripts, we developed FuTER (Fusion TE Reporter), a long-read-based framework for identifying TE-associated fusions. Application to GSC datasets identified 78 GSC enriched fusion transcripts, several supported by breakpoint-spanning reads in polysome fractions, consistent with ribosome association. ConclusionsOur data suggest that TEs correlate with abnormal splicing activity and altered ribosome engagement in glioblastoma stem cells. By integrating long-read sequencing with polysome profiling and fusion detection, we establish a framework for analysis of TE-induced transcript diversity and its effects on cancer evolution and plasticity.